Causal Effects and Experiments

GVPT399F: Power, Politics, and Data

Causes and effects

Our goal is to better understand what factors lead to certain outcomes of interest.

  • Does increasing the number of voting booths close to a potential voter make that person more likely to vote?

  • Do peace treaties signed with factionalized rebel groups more often lead to a return to conflict than those signed with a single, cohesive group?

  • Does trade between two countries make war between them less likely?

Causes and effects

These are causal statements:

  • More local voting booths \(\rightarrow\) More likely to vote

  • More factionalization \(\rightarrow\) More likely to restart conflict

  • Trade \(\rightarrow\) Less likely to go to war

How to identify causes and their effects

Proving that changes to one factor cause changes to another is very tricky!

  • Need to account for all the other factors

The efficacy of international election monitors

Do international monitors cause less election-day fraud in democratic elections?

  • Number of elections monitored by international observers increased throughout the 2000s

  • But do monitors increase the chances the election will be free and fair?

  • In other words, are they effective?

The efficacy of international election monitors

Dr Susan Hyde set out to answer this very question. From her article (page 39):

If the presence of international observers causes a reduction in election-day fraud, the effect of observers should be visible at the subnational level by comparing polling stations that were visited by observers with those that were not visited. More specifically, if international monitoring reduces electionday fraud directly, all else held equal, the cheating parties should gain less of their ill-gotten vote share in polling stations that were visited by international monitors.

Causal relationships

Refers to the directional connection between a change in one variable and a corresponding change in another.

  • The direction matters!

  • Treatment variable: the variable causing changes to another variable.

  • Outcome variable: the variable changing as a result of changes to another variable (the treatment).

Treatment and outcome variables

We want to test whether the presence of international monitors (treatment) leads to less election day fraud (outcome).

International election monitors present at polling stations \(\rightarrow\) Less election-day fraud at that station

Treatment variable

At any given polling station in any given election, monitors may be: 1) present, or 2) not present.

Two different conditions:

  • Treatment: condition with treatment (monitors are present)

  • Control: condition without treatment (monitors are not present)

Outcome variable

At any given polling station in any given election, fraud may: 1) occur, or 2) not occur.

Observing our outcome

Sometimes it can be hard to observe our outcome of interest.

  • How might we see all election-day fraud?

  • How might we measure it?

Observing our outcome

Hyde’s answer: vote-share!

International election monitors present at polling stations \(\rightarrow\) Less election-day fraud at that station \(\rightarrow\) Lower vote-share for the cheating party

Individual causal effects

We want to know whether the treatment causes a change in our outcome of interest.

Individual causal effects

What might this look like in the real world?

  • Imagine we are looking at a specific election

  • 100 polling stations

  • Measure the vote share received by the cheating party

Hypothetical election

# A tibble: 100 × 3
   polling_station_id vote_share_monitored vote_share_not_monitored
                <int>                <dbl>                    <dbl>
 1                  1                 34.9                     85.9
 2                  2                 58.8                     86.3
 3                  3                 74.3                     92.0
 4                  4                 29.4                     96.0
 5                  5                 25.0                     86.7
 6                  6                 58.3                     97.0
 7                  7                 49.2                     93.4
 8                  8                 54.5                     91.7
 9                  9                 28.4                     80.0
10                 10                 33.1                     84.4
# ℹ 90 more rows

Individual effects

The only difference between these two conditions is the presence of international monitors!

  • The difference between vote shares under these conditions is caused by the monitors.

Individual effects

# A tibble: 100 × 4
   polling_station_id vote_share_monitored vote_share_not_monitored difference
                <int>                <dbl>                    <dbl>      <dbl>
 1                  1                 34.9                     85.9      -51.0
 2                  2                 58.8                     86.3      -27.6
 3                  3                 74.3                     92.0      -17.7
 4                  4                 29.4                     96.0      -66.6
 5                  5                 25.0                     86.7      -61.7
 6                  6                 58.3                     97.0      -38.7
 7                  7                 49.2                     93.4      -44.2
 8                  8                 54.5                     91.7      -37.2
 9                  9                 28.4                     80.0      -51.6
10                 10                 33.1                     84.4      -51.3
# ℹ 90 more rows

No parallel worlds

Sadly for us, we cannot create parallel worlds…

# A tibble: 100 × 4
   polling_station_id monitored vote_share_monitored vote_share_not_monitored
                <int>     <int>                <dbl>                    <dbl>
 1                  1         1                 34.9                     NA  
 2                  2         0                 NA                       86.3
 3                  3         0                 NA                       92.0
 4                  4         1                 29.4                     NA  
 5                  5         0                 NA                       86.7
 6                  6         0                 NA                       97.0
 7                  7         1                 49.2                     NA  
 8                  8         0                 NA                       91.7
 9                  9         1                 28.4                     NA  
10                 10         1                 33.1                     NA  
# ℹ 90 more rows

What now?

# A tibble: 100 × 5
   polling_station_id monitored vote_share_monitored vote_share_not_monitored
                <int>     <int>                <dbl>                    <dbl>
 1                  1         1                 34.9                     NA  
 2                  2         0                 NA                       86.3
 3                  3         0                 NA                       92.0
 4                  4         1                 29.4                     NA  
 5                  5         0                 NA                       86.7
 6                  6         0                 NA                       97.0
 7                  7         1                 49.2                     NA  
 8                  8         0                 NA                       91.7
 9                  9         1                 28.4                     NA  
10                 10         1                 33.1                     NA  
# ℹ 90 more rows
# ℹ 1 more variable: difference <dbl>

Average causal effects

We need to move away from looking at individuals and start to look for patterns in our group.

Back to our parallel worlds

# A tibble: 100 × 4
   polling_station_id vote_share_monitored vote_share_not_monitored difference
                <int>                <dbl>                    <dbl>      <dbl>
 1                  1                 34.9                     85.9      -51.0
 2                  2                 58.8                     86.3      -27.6
 3                  3                 74.3                     92.0      -17.7
 4                  4                 29.4                     96.0      -66.6
 5                  5                 25.0                     86.7      -61.7
 6                  6                 58.3                     97.0      -38.7
 7                  7                 49.2                     93.4      -44.2
 8                  8                 54.5                     91.7      -37.2
 9                  9                 28.4                     80.0      -51.6
10                 10                 33.1                     84.4      -51.3
# ℹ 90 more rows

Difference of averages across all individuals

What was the average vote share received in each world?

# A tibble: 1 × 3
  vote_share_monitored vote_share_not_monitored difference
                 <dbl>                    <dbl>      <dbl>
1                 39.5                     83.4      -43.9

Back to reality

# A tibble: 100 × 5
   polling_station_id monitored vote_share_monitored vote_share_not_monitored
                <int>     <int>                <dbl>                    <dbl>
 1                  1         1                 34.9                     NA  
 2                  2         0                 NA                       86.3
 3                  3         0                 NA                       92.0
 4                  4         1                 29.4                     NA  
 5                  5         0                 NA                       86.7
 6                  6         0                 NA                       97.0
 7                  7         1                 49.2                     NA  
 8                  8         0                 NA                       91.7
 9                  9         1                 28.4                     NA  
10                 10         1                 33.1                     NA  
# ℹ 90 more rows
# ℹ 1 more variable: difference <dbl>

Difference-of-means

What was the average vote share received in each group?

# A tibble: 1 × 3
  vote_share_monitored vote_share_not_monitored difference
                 <dbl>                    <dbl>      <dbl>
1                 38.6                     82.2      -43.6

How on Earth does this work so well?

Randomization!

  • Monitors were assigned to polling stations randomly (for example, with the flip of a coin)

  • This created two groups of stations that were roughly identical on average to one another prior to treatment

  • This mimics what happens when we split our world into two (creating two literally identical groups)

Do international monitors deter election-day fraud?

Yes!

  • The international community monitored the 2003 Armenian Presidential elections
  • Monitors were assigned randomly to the polling stations
  • Hyde found a big average difference between the vote share received by the cheating party at monitored stations compared to non-monitored stations.